Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
translated by 谷歌翻译
During X-ray computed tomography (CT) scanning, metallic implants carrying with patients often lead to adverse artifacts in the captured CT images and then impair the clinical treatment. Against this metal artifact reduction (MAR) task, the existing deep-learning-based methods have gained promising reconstruction performance. Nevertheless, there is still some room for further improvement of MAR performance and generalization ability, since some important prior knowledge underlying this specific task has not been fully exploited. Hereby, in this paper, we carefully analyze the characteristics of metal artifacts and propose an orientation-shared convolution representation strategy to adapt the physical prior structures of artifacts, i.e., rotationally symmetrical streaking patterns. The proposed method rationally adopts Fourier-series-expansion-based filter parametrization in artifact modeling, which can better separate artifacts from anatomical tissues and boost the model generalizability. Comprehensive experiments executed on synthesized and clinical datasets show the superiority of our method in detail preservation beyond the current representative MAR methods. Code will be available at \url{https://github.com/hongwang01/OSCNet}
translated by 谷歌翻译
Graph Neural Networks (GNNs) have been a prevailing technique for tackling various analysis tasks on graph data. A key premise for the remarkable performance of GNNs relies on complete and trustworthy initial graph descriptions (i.e., node features and graph structure), which is often not satisfied since real-world graphs are often incomplete due to various unavoidable factors. In particular, GNNs face greater challenges when both node features and graph structure are incomplete at the same time. The existing methods either focus on feature completion or structure completion. They usually rely on the matching relationship between features and structure, or employ joint learning of node representation and feature (or structure) completion in the hope of achieving mutual benefit. However, recent studies confirm that the mutual interference between features and structure leads to the degradation of GNN performance. When both features and structure are incomplete, the mismatch between features and structure caused by the missing randomness exacerbates the interference between the two, which may trigger incorrect completions that negatively affect node representation. To this end, in this paper we propose a general GNN framework based on teacher-student distillation to improve the performance of GNNs on incomplete graphs, namely T2-GNN. To avoid the interference between features and structure, we separately design feature-level and structure-level teacher models to provide targeted guidance for student model (base GNNs, such as GCN) through distillation. Then we design two personalized methods to obtain well-trained feature and structure teachers. To ensure that the knowledge of the teacher model is comprehensively and effectively distilled to the student model, we further propose a dual distillation mode to enable the student to acquire as much expert knowledge as possible.
translated by 谷歌翻译
胸部X射线(CXR)中准确的异常定位可以使各种胸部疾病的临床诊断受益。但是,病变水平的注释只能由经验丰富的放射科医生进行,这是乏味且耗时的,因此很难获得。这种情况导致难以开发CXR的完全监督异常定位系统。在这方面,我们建议通过一个弱半监督的策略来训练CXR异常本地化框架,称为“超越阶级”(PBC),该策略(PBC)使用了少数带有病变级别边界框的完全注释的CXR,并通过广泛的弱化的样品和大量的带有注释的样品。点。这样的点注释设置可以通过边缘注释成本提供弱实例级信息,以实现异常定位。尤其是,我们的PBC背后的核心思想是学习从点注释到边界框的强大而准确的映射,以根据注释点的差异。为此,提出了一个正则化项,即多点的一致性,它驱动模型从相同异常内的不同点注释中生成一致的边界框。此外,还提出了一种被称为对称的一致性的自学,也提出了从弱注释的数据中深入利用有用的信息来实现异常定位。 RSNA和VINDR-CXR数据集的实验结果证明了该方法的有效性。当使用少于20%的盒子级标签进行训练时,与当前的最新方法相比,我们的PBC可以在MAP中提高〜5的改进(即点DETR)。代码可从https://github.com/haozheliu-st/point-beyond-class获得。
translated by 谷歌翻译
近年来,生成的对抗网络(GAN)在各种任务和应用中都显示出了令人信服的结果。但是,模式崩溃仍然是gan的关键问题。在本文中,我们提出了一条新型的培训管道,以解决甘恩斯的模式崩溃问题。与现有方法不同,我们建议将鉴别器概括为特征嵌入,并最大程度地提高鉴别器学到的嵌入空间中分布的熵。具体而言,两个正则化术语,即深度局部线性嵌入(DLLE)和深度等距特征映射(疾病),旨在鼓励歧视者学习嵌​​入数据中的结构信息,以便可以是歧视器所学的嵌入空间,可以是可以得到的。形成良好。基于鉴别器支持的良好学习嵌入空间,非参数熵估计量旨在有效地最大化嵌入向量的熵,以最大化生成分布的熵的近似值。通过改善鉴别器并最大化嵌入空间中最相似的样品的距离,我们的管道可有效地减少模式崩溃的情况,而无需牺牲生成的样品的质量。广泛的实验结果表明,我们的方法的有效性超过了GAN基线,MAF-GAN在Celeba上(9.13 vs. 12.43),超过了最新的基于动漫的能量模型(Anime-Face DataSet( 2.80 vs. 2.26的成立得分)。
translated by 谷歌翻译
本文介绍了Omnicity,这是一种从多层次和多视图图像中了解无所不能的城市理解的新数据集。更确切地说,Omnicity包含多视图的卫星图像以及街道级全景图和单视图图像,构成了超过100k像素的注释图像,这些图像是从纽约市的25k Geo-Locations中良好的一致性和收集的。为了减轻大量像素的注释努力,我们提出了一个有效的街景图像注释管道,该管道利用了卫星视图的现有标签地图以及不同观点之间的转换关系(卫星,Panorama和Mono-View)。有了新的Omnicity数据集,我们为各种任务提供基准,包括构建足迹提取,高度估计以及构建平面/实例/细粒细分。我们还分析了视图对每个任务的影响,不同模型的性能,现有方法的局限性等。与现有的多层次和多视图基准相比,我们的Omnicity包含更多具有更丰富注释类型和更丰富的图像更多的视图,提供了从最先进的模型获得的更多基线结果,并为街道级全景图像中的细粒度建筑实例细分介绍了一项新颖的任务。此外,Omnicity为现有任务提供了新的问题设置,例如跨视图匹配,合成,分割,检测等,并促进开发新方法,以了解大规模的城市理解,重建和仿真。 Omnicity数据集以及基准将在https://city-super.github.io/omnicity上找到。
translated by 谷歌翻译
几乎没有类似的课堂学习(FSCIL)旨在通过避免过度拟合和灾难性遗忘,从一些标记的样本中逐步学习新颖的课程。 FSCIL的当前协议是通过模仿一般类知识学习设置来构建的,而由于不同的数据配置,即新颖的类都在有限的数据状态下,因此并不完全合适。在本文中,我们通过保留第一个会话的可能性来重新考虑FSCIL对开放式假设的配置。为了为模型分配更好的近距离和开放式识别性能,双曲线相互学习模块(Hyper-RPL)建立在与双曲神经网络的相互点学习(RPL)上。此外,为了从有限标记的数据中学习新颖类别,我们将双曲线度量学习(超级现象)模块纳入基于蒸馏的框架中,以减轻过度拟合的问题,并更好地处理保存旧知识和旧知识之间的权衡问题。获得新知识。对三个基准数据集上提出的配置和模块的全面评估被执行,以验证有关三个评估指标的有效性。
translated by 谷歌翻译
流量预测是智能交通系统中时空学习任务的规范示例。现有方法在图形卷积神经操作员中使用预定的矩阵捕获空间依赖性。但是,显式的图形结构损失了节点之间关系的一些隐藏表示形式。此外,传统的图形卷积神经操作员无法在图上汇总远程节点。为了克服这些限制,我们提出了一个新型的网络,空间 - 周期性自适应图卷积,并通过注意力网络(Staan)进行交通预测。首先,我们采用自适应依赖性矩阵,而不是在GCN处理过程中使用预定义的矩阵来推断节点之间的相互依存关系。其次,我们集成了基于图形注意力网络的PW注意,该图形是为全局依赖性设计的,而GCN作为空间块。更重要的是,在我们的时间块中采用了堆叠的散布的1D卷积,具有长期预测的效率,用于捕获不同的时间序列。我们在两个现实世界数据集上评估了我们的Staan,并且实验验证了我们的模型优于最先进的基线。
translated by 谷歌翻译
从磁共振成像(MRI)中进行精确的脑肿瘤分割,对于多模式图像的联合学习是可取的。但是,在临床实践中,并非总是有可能获得一组完整的MRI,而缺失模态的问题会导致现有的多模式分割方法中的严重性能降解。在这项工作中,我们提出了第一次尝试利用变压器进行多模式脑肿瘤分割的尝试,该脑肿瘤分割对任何可用模式的任何组合子集都是可靠的。具体而言,我们提出了一种新型的多模式医疗变压器(MMMFORMER),用于不完整的多模式学习,具有三个主要成分:混合模态特异性的编码器,该编码器在每种模式中桥接卷积编码器和一个局部和全局上下文模型的模式内变压器;一种模式间变压器,用于建立和对齐模态跨模态的远程相关性,以对应于肿瘤区域的全局语义。一个解码器,与模态不变特征进行渐进的上采样和融合,以生成可靠的分割。此外,在编码器和解码器中都引入了辅助正规化器,以进一步增强模型对不完整方式的鲁棒性。我们对公共批评的大量实验$ 2018 $ $数据集用于脑肿瘤细分。结果表明,所提出的MMFORMER优于几乎所有不完整模态的亚群的多模式脑肿瘤分割的最新方法,尤其是在肿瘤分割的平均骰子中平均提高了19.07%,只有一种可用的模式。该代码可在https://github.com/yaozhang93/mmmenforer上找到。
translated by 谷歌翻译
联合学习(FL)使分布式客户端能够学习共享模型以进行预测,同时保留每个客户端的培训数据本地。然而,现有的FL需要完全标记的培训数据,这是由于高标签成本和专业要求的要求而不方便或有时不可行。在许多现实设置中,缺乏标签会使流行不切实际。自我监督学习可以通过从未标记的数据学习来解决这一挑战,从而可以广泛使用FL。对比学习(CL)是一种自我监督的学习方法,可以有效地学习来自未标记数据的数据表示。然而,Clipers上收集的分布式数据通常在客户端之间通常不是独立和相同分布(非IID),并且每个客户端只有很少的数据类,这会降低CL和学习的表示的性能。为了解决这个问题,我们提出了由两种方法组成的联邦对比学习框架:特征融合和邻居匹配,通过该邻居匹配,以便获得更好的数据表示来实现客户端之间的统一特征空间。特征融合提供远程功能,作为每个客户端的准确对比信息,以获得更好的本地学习。邻域匹配进一步将每个客户端的本地功能对齐至远程功能,从而可以了解客户端之间的群集功能。广泛的实验表明了拟议框架的有效性。它在IID数据上以11 \%的方式表达了其他方法,并匹配集中学习的性能。
translated by 谷歌翻译